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Abstract 

We study bracketing numbers for spaces of bounded convex functions in the 
L p norms. We impose no Lipschitz constraint. Previous results gave bounds 
when the domain of the functions is a hyperrectangle. We extend these re¬ 
sults to the case wherein the domain is a polytope. Bracketing numbers are 
crucial quantities for understanding asymptotic behavior for many statistical 
nonparametric estimators. Our results are of interest in particular in many 
multidimensional estimation problems based on convexity shape constraints. 
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1 Introduction and Motivation 

To quantify the size of an infinite dimensional set, the pioneering work of Kolmogorov and Tihomirov 

(1961) studied the metric covering number of the set and its logarithm, the metric 

entropy. Metric entropy quantifies the amount of information it takes to recover 

any element of a set with a given accuracy e. This quantity is important in many 

areas of statistics and information theory; in particular, the asymptotic behavior of 

empirical processes and thus of many statistical estimators is fundamentally tied to 

the entropy of the class under consideration (Dudley, 1978). 

In this paper, we are interested not in the metric entropy but the related brack¬ 
eting entropy for a class of functions. Let T be a set of functions and let d be 
a metric on T. We call a pair of functions [l, u] a bracket if l < u pointwise. 

For e > 0, the e-bracketing number of J 7 , denoted lVn(e, J 7 , d), is the smallest N 
such that there exist brackets u ,\, i = 1,,N, such that for all / £ J 7 , there 
exists i with k(x) < f(x) < Ui(x ) for all x. Like metric entropies, bracketing en¬ 
tropies are fundamentally tied to rates of convergence of certain estimators (see e.g., 

Birge and Massart (1993), van der Vaart and Wellner (1996), van de Geer (2000)). 

In this paper, we study the bracketing entropy of classes of convex functions. Our in¬ 
terest is motivated by the study of nonparametric estimation of functions satisfying 
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convexity restrictions, such as the least-squares estimator of a convex or concave re¬ 
gression function on (e.g., Seijo and Sen (2011), Guntuboyina and Sen (2015)), 
possibly in the high dimensional setting (Xu et ah, 2014), or estimators of a log- 
concave or s-concave density (e.g., Seregin and Wellner (2010), Koenker and Mizera 
(2010), Kim and Samworth (2014), Doss and Wellner (2015a,b), among others). 
Bracketing entropy bounds are directly relevant for studying asymptotic behavior 
of estimators in these contexts. 

Let D C R d be a convex set, let v \,..., Vd E M d , be linearly independent vectors, 
let B, Ti,..., Trf be positive reals, and let v = (iq,..., Vd) and T = (Id,..., T^). 
Then we let C (D,B,T,v) be the class of convex functions p defined on D, such 
that |</?(x)| < B for all ieD, and such that \p(x + A Vi) — p(x)\ < Tj|A| as long as 
x and x + A Vi are both elements of D. Let C ( D , B ) be the convex functions on D 
with uniform bound B and no Lipschitz constraints. For / : D —>• R, let L p (/) = 

(f D f(x) p dx') 1/ ' p for 1 < p < oo, and let L 00 (f) = sup^g^ |/(x)|. Since convex 
functions are Lebesgue almost everywhere two-times differentiable their entropies 
correspond to the entropy for twice differentiable function classes, namely . 
When D is a hyperrectangle and B = 1, T* = 1, Bronshtein (1976) 

and Dudley (1984), chapter 8, indeed show that log IV (e,C (D,B,F) ,Loo) < e _d//2 . 
Here, N(e,J-,p ) is the e-covering number of T in the metric p, i.e. the smallest 
number of balls of p-radius e that cover T. 

Bracketing entropies govern the suprema of corresponding empirical processes 
and thus govern the rates of convergence of certain statistical estimators. In many 
problems, including some of the statistical ones mentioned above, the classes that 
arise do not naturally have Lipschitz constraints, and so the class C ( D , B,T) is not 
of immediate use. Without Lipschitz constraints, the L ^ bracketing numbers are 
not bounded, but one can use the L p metrics, 1 < p < oo, instead: Dryanov (2009) 
and Guntuboyina and Sen (2013) found bounds when d = 1 and d > 1, respectively, 
for metric entropies of C ( D , 1): they found that logIV (e,C (D , 1), L p ) < e -rf//2 , again 
with D a hyperrectangle. The d = 1 case (from Dryanov (2009)) was the fundamen¬ 
tal building block in computing the rate of convergence of the univariate log-concave 
and s-concave MLEs in Doss and Wellner (2015a). In the corresponding statistical 
problems when d > 1, the domain of the functions under consideration is not a hy¬ 
perrectangle but rather is a polytope, and thus the results of Guntuboyina and Sen 
(2013) are not always immediately applicable, and there is need for results on more 
general convex domains D. It is not immediate that previous results will apply, 
since D may have a complicated boundary. In this paper we are able to indeed find 
bracketing entropies for all polytopes D , attaining the bound 

log iVjj (e,C (D,B) ,L p ) < e~ d ^ 2 

with 1 < p < oo, D a polytope, and 0 < B < oo. Note we work with bracketing 
entropy rather than metric entropy. Bracketing entropies are larger than metric 
entropies (van der Vaart and Wellner, 1996), so bracketing entropy bounds imply 
metric entropy bounds of the same order. Along the way, we also generalize the 
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results of Bronshtein (1976) to bound the L^ bracketing numbers of C (D, B,T) 
when D is arbitrary. One of the benefits of our method is its constructive nature. 
We initially study only simple polytopes and in that case attempt to keep track of 
how constants depend on D. 

This paper is organized as follows. In Section 2 we prove bounds for bracketing 
entropy of classes of convex functions with Lipschitz bounds, using the L a0 metric. 
We use these to prove our main result for the bracketing entropy of classes of convex 
functions without Lipschitz bounds in the L p metrics, 1 < p < oo, which we do in 
Section 3. We defer some of the details of the proofs to Section 4. 

2 Bracketing with Lipschitz Constraints 

If we have sets D* C M d , i = 1,..., M, for M € N, and D C U^-Dj then for e* > 0, 

( / M \ l /v \ M 

(E e H ,C(D,l),L p \ <nA r []fe,C(Al)k,£p)- (1) 

where, for a class of functions T and a set G, we let F\g denote the class {/|g : / g 
where f\c is the restriction of / to the set G. We will apply (1) to a cover of D by 
sets G with the property that 

C {D, 1)\ G GC (G, 1, r) 

for some T < oo, so that we can apply bracketing results for classes of convex func¬ 
tions with Lipschitz bounds. Thus, in this section, we develop the needed bracketing 
results for such Lipschitz classes, for arbitrary domains G. Recall C ( D , 1, T, v) is the 
class of convex functions (p defined on D, uniformly bounded by B and with Lips¬ 
chitz parameter R in the direction Vi. When Vi are the standard basis of M d , we just 
write C (D, 1, T). When we have Lipschitz constraints on convex functions, we will 
see that the situation for forming brackets for C ( D , 1,T) with D C [0, \] d is essen¬ 
tially the same as for forming brackets for C ([0, l] rf , 1,T). For two sets C,D C. M d , 
define the Hausdorff distance between them by 

Ih(C,D) := max sup inf \\x — y||, sup inf ||x — y\\ 

y eC x ^ D 

For B > 0 and a convex function / defined on a convex set D, define the epigraph 
Vb(/) by 

Vb{}) := {(®i, ■ ■ .,Xd,x d + 1 ) : (®i, ...,x d )€ D,f(x i,... ,x d ) < x d +i < B} . 

Bronshtein (1976) found entropy estimates in the Hausdorff distance for classes of d- 
dimensional convex sets (see also Dudley (1999), chapter 8). These entropy bounds 
for classes of convex sets are the main tool for Bronshtein (1976)’s entropy bounds 
for classes of convex functions, and they will also be the main tool in our bracketing 
bounds for convex functions (with Lipschitz constraints). 
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Theorem 2.1 (Bronshtein (1976)). For any R > 0 and any integer d > 1, there exist 
positive real numbers c d and eo jd such that for all 0 < e < eo^R there is an e-cover of 
JC d+1 (R) in the Hausdorff distance of cardinality not larger than exp {c^R/e)^ 2 }. 

The following lemma connects the Hausdorff distance on sets of epigraphs of 
Lipschitz functions to the supremum distance for those functions. 

Lemma 2.1. Let G C [0, l] d be any convex set and B,T\,... ,1^ > 0. For f,g€ 
C(G,B,(T 1 ,...,T d )), 


Wf-9\\oo<lH(y B (f),V B (g)) 


\ 


i + E r t 


i —1 


Proof. For ease of notation, let p = lH{VB{f)iVB{g))- Fix x E G and suppose 
f(x) < g(x), without loss of generality. Now, (x,/(x)) E V B (f) so there exists 
(x',2/0 € V B (g) such that \\(x',y') - (x,/(x))|| < p. Since f(x) < g(x), (x,f(x)) is 
outside the epigraph of V B (g) so by convexity of Vb(^), y' = g(x'). Thus 

0 < g{x)-f(x) = g(x)-g(x')+g{x')-f(x ) < ||x-s'||^/r? + • • • + T 2 d +\g(x')-f(x)\, 

since g(x) - g{x’)\ = \g(x 1 ,... ,x d ) - g(xi, ..., x d - 1: x' d ) H-f g(x 1} x' 2 ,... ,x' d ) - 

g(x \...., x' d )\ which is bounded above by 


\x d - x' d \T d 4-f \xi - < \\x - x'\\ \Jt\ -\ -h T 2 d 

by the Cauchy-Schwarz inequality. Thus, again by Cauchy-Schwarz, 


as desired. 


0 < g(x) - f(x) < p i + Y^ T 'h 


□ 


Theorem 3.2 from (Guntuboyina and Sen, 2013) gives the following result when 
D = nti^A]; we now extend it to the case of a general D. When we consider 
convex functions without Lipschitz constraints, we will partition D into sets that 
are similar to parallelotopes. Note that if P C R C W l where R is a hyperrectangle 
and P is a parallelotope defined by vectors vi,... ,v d , then if A is a linear map with 
vi ,..., v d as its eigenvectors (thus rescaling R), then AR will not necessarily still be 
a hyperrectangle, i.e. its axes may no longer be orthogonal. Thus, we cannot argue 
by simple scaling arguments that bracketing numbers for P scale with the lengths 
along the vectors v. L . 

Theorem 2.2. Let ai < b{ and let D C niik , bi] be a convex set. Let T = 
(ri,...,rrf) and 0 < B,T\,... ,T d < oo. Then there exist positive constants c= c d 
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and eo = eo jd such that 


log N {] (e Vol(H) 1/p , C (D,B, T), L p ) < log N {] (e, C (A B, T) , L M ) 

/or 0 < e < e 0 [b + Ya =i r i(&z “ a i)) an d P> 1 - 

Proof. The first inequality of the theorem is elementary. We will show the second. 
First we note the following scaling relationship. For / € C (D, B,T) we can define / : 
D ->• K, where I) C [0,l] d , by /(i 1 ,... ,t d ) = f(a 1 +ti(b 1 -a 1 ),.. ■ ,a d + t d (b d -a d )). 
Then / € C ^ D , B, (ri(£>i — a\), ..., T d (b d — a d ))^j ■ This shows that 


N[\ (e,C ( D,B , (Ti(6i — a\),... ,T d (b d — a d ))J , Ao) 
= N [] (e,C(D,B,(F 1 ,...,r d )),L 00 ). 


(2) 


Thus, we now let Oj = 0 and 6* = 1 and consider a convex domain D C [0, l] c/ . It is 
then clear if / € C (^D, B^j that Vb(/) € /C d+1 (y/d + B 2 ), where 

JC d+1 (R) = {D : D is a closed, convex set, D C B( 0, R)} 


for R > 0. Thus, given an [e/ 1 + + • • • + ^ -cover in Hausdorff dis¬ 

tance of IC d+1 (R) of N elements Vi,...,Vjy, we can pick Vg(/i),. .., Vb(/at) for 

N < N, such that Zh(Vb(/i), t^) < e/(4^/l + Tf -|-b T^), if such an /, € 

C £>, (Ti,..., T d )^ exists. Then from Lemma 2.1, [/,; — e, /* + e] form an 

bracketing set for C (^D, B, (Ti,..., r d )^. Thus, by Theorem 2.1, for some positive 
c, eo, 


log V[] 


(f~ \ \ f v/(^ + b 2 )( i + rf h + Trf)\ 

(e,c (p, B, (r l5 ..., F d )j , Ac) < c - - - 


d/2 


) 


for 0 < e < eo\J(d + B 2 )( 1 + Tf + • • • + T^). Using (2), we see that 


log AT[] ( £ ,C (B,B, (r 1; .. .,r d )) , Ao) < c 


7(<i + fr)(i+ £,!■?((>,- 



d/2 


( 3 ) 
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for 0 < e < eo \J(d + B 2 ) ^1 + Yli =i C (h ~ a *) 2 j • It is immediate that the left side 
of (3) equals 




Td 

A 


5 Lq 


for any A > 0 so that for all A > 0 (3) is bounded above by 


d/2 


for 0 < e < e 0 y i'M J -f B 2 )( 1 -f V Fof, * We pick 


a2 = / ^Etiq^-o.) 2 


which yields 


logiV[j (e,C (£>,5, (ri,... ,r d )) ,Loo) < c 


B + Jd^rKb,-, 


d/2 


if 0 < e < e 0 (b + \Jd^ r?(fe, - a*) 2 ) . Since 


l^ 2 i(bi-ai ) 2 < ^Tiibi-Oi) < dJ2^ 2 i(bi-ai) 2 , 


which are basic facts about l p norms in M d , we are done showing the second inequality 


of the theorem. 


□ 


3 Bracketing without Lipschitz Constraints 

In the previous section we bounded bracketing entropy for classes of functions with 
Lipschitz constraints. In this section we remove those Lipschitz constraints. 

3.1 Notation and Assumptions 

With Lipschitz constraints we could consider arbitrary domains D, but without the 
Lipschitz constraints we need more restrictions. We will now require that D is a 
polytope, and, to begin with, we also assume that D is simple. We will consider 
only the case d > 2 since the result is given when d = 1 in Dryanov (2009). 

Assumption 1. Let d > 2 and let D C be a simple convex polytope, meaning 
that all (d — k)-dimensional faces of D have exactly k incident facets. 
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It is well-known that the simplicial polytopes are dense in the class of all poly¬ 
topes in the Hausdorff distance. The simple polytopes are dual to the simplicial 
ones, and are also dense in the class of all polytopes in the Hausdorff distance 
(page 82 of Griinbaum (1967)). Any convex polytope can be triangulated into 
0(n L rf / 2 J) simplices (which are simple polytopes) if the polytope has n vertices, see 
e.g. Dey and Pach (1998), and so one can translate our results to a general polytope 
D. However, then any geometric intuition provided by the constants in the bounds 
is lost. 

We let D = n f-iEj where Ej := {x G : {vj,x) > Pj} are halfspaces with 
(inner) normal vectors Vj, and where pj G M, for j = 1, ...,1V. Let Hj : = 
{x G : (x, Vj) = Pj} be the corresponding hyperplanes. For k G N, let J k : = 

€ {1 ,-..,N} k : ji < ■■■ < j k | and I k : = {0,...,A} fc . For j € J k , let 
Gj = r k a=1 H ja . 

Any Gj , j G J kl is (d— fc)-dimensional and so, by Fritz John’s theorem (John (1948), 
see also Ball (1992) or Ball (1997)), contains a {d — fe)-dimensional ellipsoid Aj —Xj 
of maximal (d — /c)-dinrensional volume, such that 


Aj — Xj C Gj — Xj C d(Aj — Xj) (4) 

for some point Xj G Gj. Let e k +i ,..., be the orthonormal basis given by the 
axes of the ellipsoid Aj — Xj and let 7j, a /2 be the radius of Aj in the direction 
e a , meaning that Xj ± 7j.a e «/2 lies in the boundary of Aj. We will rely heav¬ 
ily on Fritz John’s theorem to understand the size of Gj. Let d + (x,dGj,e ) := 
inf/oo {K : x + Ke H dGj ^ 0} and let 

u := 2“ 2 ( p+1 ) ( p+2 ) A min min — ^ (5) 

fee{l,...,d—1} j'eJ fc ,e6span{e fc+1 ,...,e d } Lfc, 2 

where 

^ (f'yi'Vjp') 

Lk ,2 := 1 V sup V' -{- (6) 

(a,^) 

and / 7 are defined in Proposition 4.2. Then let 

0 = < <5i < • • ■ < < u = 5 a +i < Sa +2 = oo (7) 


be a sequence to be defined later. 

Let LinP be the translated affine span of P, i.e. the space of all linear combi¬ 
nations of elements of (P — x), for any x G P. Note that linP is commonly used 
to refer to the linear span of P rather than of P — x, and thus to distinguish from 
this case, we use the notation “Lin” rather than “lin.” For a point x, a set H, and 
a unit vector v, let 

d(x, H , v) := inf {|fc| :x + bG H} 
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be the distance from x to H in direction v, and for a set E, d(E , H , v) := mi x ^E d(x, H, v ). 
For i = (ii,..., i k ) E 4 and j = (ji,..., j k ) E J k let 

< <3(x, H ja ) < 5j a+1 for a = 1,..., N} , (8) 

where for a > & we let i a = H + 1. These sets are not parallelotopes, since for 
a > k, 5j Q +i = oo. However, for any x E Gj, {Gij — x) n spanju^,..., Vj fj }, for 
/3 < k, is contained in a /3-dimensional parallelotope. 

3.2 Main Results 

We want to bound the slope of functions / E C ( D , 1) ^ so that we can apply 
bracketing bounds on convex function classes with Lipschitz bounds. Note that 
each G,- ,• is distance <5,- in the direction of Vj from H; , which means that if 
f E C ( D , 1) | Qi j then / has Lipschitz constant bounded by 2/5i a along the direction 
Vj a towards Hj a . However, the vectors Vj a are not orthonormal, so the distance from 
Gij along Vj a to a hyperplane other than Hj a may be smaller than 5i a . 

For each Gij we will find an orthonormal basis such that Gij is contained in 
a rectangle R whose axes are given by the basis and whose lengths along those 
axes (i.e., widths) is bounded by a constant times the width of one of the normal 
vectors Vj a . Furthermore, the distance from R along each basis vector to dD will 
be bounded by the distance from Gij along v ]a to Hj a . This will give us control of 
both the Lipschitz parameters and the widths corresponding to the basis, and thus 
control of the size of bracketing for classes of convex functions. 

Proposition 3.1. Let Assumption 1 hold for a convex polytope D. For each k € 

{0, ...,d}, i E I k ,j G Jk, and each Gij, there is an orthornormal basis eij = 
e := (ei,..., e^) ofW 1 such that for any f £ C ( D,B) \f has Lipschitz constant 
2B/8i a in the direction e a , where 5i a = 5 a+ i if k + 1 < a < d. Furthermore, for 
ol — 1,..., k, €i j a Cq satisfies 

e a € span {vj 1 ,Vj a } , e a ± span {vj 1 ,..., Vj a _ 1 } , and {e a ,v a ) > 0, 
and for a € {k + 1,... , d}, e a _L span [v n ,..., Vj k }. 

Proof. Without loss of generality, for ease of notation we assume in this proof that 

jp = P for /3 = 1,..., k, 

and then that 

^ di 2 < • • • < 5i k < 8i k+1 ="*** = di N , 

where we let i a = A + 1 for k < a < N. That is, we assume that Hi ,..., H k 
are the nearest hyperplanes to Gij , in order of increasing distance. To define the 
orthonormal basis vectors, we will use a Gram-Schmidt orthonormalization, pro¬ 
ceeding according to increasing distances from Gi j to the hyperplanes Hj. Define 
ei := v\ and for 1 < j < k, define ej inductively by 

e.j E span {ui,..., Vj} , ej T span {ui,..., Vj- 1 } , (ej, Vj) > 0, and 11ey || = 1, 
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and let {ej }^ =fc+1 be any orthonormal basis of span {v\,... ,v k } L . 

For a € {1,..., k}, for any x € Gij, since d{x, H a , v ) is smallest when v is v a , 

d(x,H a ,e a ) > d(x,H a ,v a ) > S ia , 

d(x, Hj,e a ) > d(x, Hj,Vj) > 5i j > Si a , for all N > j > a, and 
d(x, Hj,e a ) = oo > 5 ia for j < a, 

since e a _L span { 17 ,..., v a -i}. Similarly, for a € {k + 1,..., d}, 

d(x, Hj, e a ) > d(x, Hj,Vj) > 5 a+ i, for all N > j > k + 1, and 
d(x, Hj, e a ) = oo > <5 a+i for j < k, 

since e a ± span {t»i,..., v k }- Thus, we have d(Gij, Hj, e a ) > 5i a for a € {1,..., d} 
and for j € {1,..., A^}. That is, we have shown 

d(Gij,dD,e a ) > 5 ia for all a G {1,..., d} . (9) 

Thus, if / G C (D, B ) \q. , then for any x € G^j, let z\ = x — 7 ie Q and Z 2 = £+ 72 e Q , 
7 i ,72 > 0, both be elements of dGij, so that by convexity we have 

-2 B < f(zi) - f(zi - 5 ia e a ) < f(x + ke a ) - f(x) < f(z 2 + S ia e a ) - f(z 2 ) < 2S 
&i a _ k S ia ~ 5 ia ’ 

using (9). Thus, / satisfies a Lipschitz constraint in the direction of e a . □ 

Here is our main theorem. It gives a bracketing entropy of e ~ d / 2 when D is a 
fixed simple polytope. Its proof relies on embedding Gij in a rectangle Rij with 
axes given by Proposition 3.1. We need to control the distance of Gij to dD, and 
we need to control the size of Ri j in terms of the widths along its axes. Then we 
can use the results of Section 2 on Ri j and thus on Gij. Our studying the size of 
Rij is somewhat lengthy so we defer that until Section 4. The constant S has an 
explicit form given in the proof of the theorem. 

Theorem 3.1. Let Assumption 1 hold for a convex polytope D C nf=i [ a i,bi\, 
for an integer d > 2. Fix p > 1. Then for some eo > 0 and for 0 < e < 

eo B (nil h ~ a i ) /P > 


log N [] (e,C(D,B),L p )<S 


B(n?=i{i>i-o.)) 1/ T 


d/2 


where S is a constant depending on d and D (and on u, which is fixed by (5)J. 
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Proof. First, we will reduce to the case where D C [0, l] d and B = 1 by a scaling 
argument. Let A be an affine map from nf=i[ a o 4] to [0,1], where D is the image 

of D, and assume we have a bracketing cover [ii, iti],..., [Zjv, un] of C (^D, 1^ . Let 

li := Bl{ o A and similarly for m, so that [l±, u ±],..., [In, un] form brackets for 
C (D,B). Their L p size is 


' D 


Ui(x ) - k(x)) p dx = B P J ( Ui(x ) - h(x)) p Y\(bi - aj)dx. 


Thus, 


i /p 


Nn eB 


n* 


di 


,C (D, B ), L p <Nr 


C D,1 ),L 


/_ j \ *-/ y 

so apply the theorem with 77 = e/B ( n ) for e. Note that the constant S 

depends on D, the version of D normalized to he in [0, l] rf . 

We now assume D C [0, l] rf and B = 1. For a sequence > 0 (constant 

over j € < 4 ), to be defined later, let a = (j2 k =o^jeJ k ,iei k aV i,k Vol d (Gij)) 1/P By 
Assumption 1 and since D C U ^ =0 U jej k ,i£i k Gi j, 


d 

N [] (a,C(D,l),Lp)<H N {] 

k=ojeJ k ,iei k 


(o ilfc \o\ d _ k {Gij) l / p ,C {D, 1 ) \ci j , Lp) , 


as in (1). Now by Lemma 4.3, we can ignore all terms with j £ J k \ J^ 1 , where 
jj? := { j € Jfc : n^, =1 Hj a is a /c-face of G}. Thus 

d 

log iV[] ( a,C (D, 1 ),L p ) < EEE log N [] (a i>fc Vold-^GijO^.C (L>, 1 ) | g<iJ ,L p ) . 

k=0j£jf> i&h 


First we compute the sum over 4 for a fixed j £ Jk- Thus by Proposition 3.1, 


C(D,l)\ Gij cC(G i j,l,T,e) (10) 

where T* = (2/5 ^,..., 2/<5j fc , 2/u ,..., 2/u). Let Rij be as in (25). That is, let 
Pj,a = w(Gj,e a ), L k , 1 be given by (22), and let 

k d 

Ri,j ^ ^ [ a !(^ia+l ^ia. )^Q!5 ^K^a + 1 _ ^ ^ [ ^^kjlPj,a^a j < 2 ‘Lh,lPj 5 

a=l a=k -\-1 


so that Gjj C x + Ri j for any x € Gjj by (26). Then by (10) (and the first 
inequality of Theorem 2.2) we bound 


Y, lo % N n 


(H,k^ol(Gij)^ p ,C ( D , 1 ) | Gi ,,L p ) < logJV[] ( ai,k,C (Gij, 1 , 14 ) , Loo) • 

( 11 ) 
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We use the trivial bracket [—1,1] for any Gij where i a = 0 for any a G {1,..., k}, 
and otherwise we use Theorem 2.2, which shows us that (11) is bounded by 


A A 


E-E« 

*i = l ik = 1 


i+ e : =1 


k 2d\(&i a+ i—5i a ) j_ 8 Lk J iPj J a \ 


+ Ea=, 


a=fc+l 


&i,k 


( 12 ) 


For i G Ifc, we will let = 1 if i a = 0 for any ct G {1,..., &}, and otherwise 

we let 


/c k 

a {i 1 ,...,i k ) ■— M a *» - — 


n%-n ei/fc exp ^ ~ p 


3=1 


Si ■■= exp < p 


3=1 


p+ 1 
P + 2 


{p + iyp 

(p + 2)V> 


- 2 ) 

— log e > , and 


i —1 


log e > for z = 1 ,..., A, 


and So = 0. Since L^ i > 1, > 1, and u < Pj, a /Lk ,2 by (5) for all k,i,j and 

a = k + 1,. .., d, we have E«=fc+i < n„=fc+i (using the fact that 

for a,b > 2, ab > a + b ). Similarly, E«=i 2 {S ia+ i-S ia )/S ia < n„=i 2 S ia+ i/S ia since 
2Si a+ i/Si a > 2. Thus (12) is bounded above by 


(■ d \) d/2 (l+ JJ 




d/2 




a=k -\-1 




e-e^ii 

21 = 1 2 /~ = l < 2=1 


2 <^+i ^ /2 


(13) 


which is 


d/2 


o=fc+l 


U 


*i = l ifc=l/8= 


For i = 1,..., A, let C* := y/e 1/k 5 i+ i/{5^), so that E^i ''' E£=i I I' . (^ 7 ) 


equals 


* 1=1 *fc=i 


J 8 =l 


'2<S i/J+ i 

\ d/2 

V Sip 0,1^ 

) 


nL. ( 


A 


Ed 

=i 

*fc= 1 


(14) 

d/2 


_ d/2^kd/2 -^k 


where 


B u ;=J2C?< 2 u d/(2(p+1)2) , 


(15) 


2=1 


by Lemma (3.1). 
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d /2 

Next, we will relate the term c(d\) d ! 2 ^1 + []a=fc-|-i 8Lk ^ a ^ to Vol d-k{Gj). 
Recall Aj is the ellipsoid defined in (4) which has diameter (and width) in the e a 
direction given by 7 j ia . By (4), 


Pj,a — ( hj n . 

The volume of Aj is \o\ d -k{Aj ) = (n«= fc+i7i,a/2) vr( rf - fc )/ 2 /r((d-fe)/2+l). Thus, 
letting C d := {2d) J ^ d k ^ /2+1) , we have 


d 

Pj,a < C'd Vold_fe(^4j) < Cd Vo\d-k{Gj). 

a=k +1 


Then we have shown that (14) is bounded above by 

c d (dl) d / 2 2 kd / 2 fl+f-) C d Vol d _ k (Gj) f] 

V ' Uy/ a=k+1 ) 


d/2 


rjk _ —d/2 


(16) 


Then, gathering the constants together into c d , we have shown 

J2 lQ g N n V°l(Gij ) 1/p ,C ( D , 1) Icy,, L p ) 

/ -d/ 2 ~ /Vold_fc(Gj) 

- e -* 


n«=fc+i Lk,i 


d/2 


\—k 


U 


kd/(2(p+l) 2 ) 


Then the cardinality of the collection of brackets covering the entire domain D is 
given by summing over j € J k and k € { 0 ,..., d}. 

We have computed the cardinality of the brackets. Now we bound their size. 
We have 


^q +1 ^i a 


» p < Y.( 2L ^) d ~ t E E <k II 

k =0 j&Jk a=l {fa,Vj a 


(17) 


by Proposition 4.2, with f a defined there. Fixing k, we have 
E Vol d _ fc (Gjj y n < £ Vold-k(Gj)Lj 3 V Vfl al5,„ +1 

j£j k *1=0 4=0a=l 

A A 

< E Vo U-dCpL/, <*,+1 ■■■ XI <«.,+.■ 


j€.Jk 


i£l k a =1 \faiV' 


J^Jk 


71 =0 




where L j:3 := max aG{ 1 ; 1/ (f a ,v ja ). We have 


£ <£{„+! = <W 1 + £ £ ) =: 


(18) 


< 2=0 


< 2=1 
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where A u < 1 + 2u 1// ( p+1 ) 2 by Lemma 3.1. Thus 


E Vold-^G,-)^ E 


+1 


E <%+i < E V°1 d - k (Gj)L k j t3 , 


j^zJk 

so by (17) 


^ 21=0 


vik —0 


jGt/fc 


a < e 


(X>m: 


i/p 


d—k qD a k 
*- , fc -^*-u 


\k=0 


where Sf = £ ieJfc Vol^G 7 )^. 

Lemma 3.1. For any 7 > 1, with A and u given by (7), 

EG< 2 u 7/(2(p+1)2) . 


□ 


Ct=l 


Proof. Taking e < eo < 1, Co < L Then for a = 2,..., A, 


Co 


Ca+1 


= exp 

> exp 

> exp 


—p log e ( P + 1 


2(p+l) 2 (p + 2) Vp + 2 

—p log e fp + 1 
2{p + l) 2 (p + 2) \p + 2 

-log u \ = . R 


a —1 


A-V 


2{p + l) 2 (p + 2 ) 

Then, C2(F 7 - 1 ) < C^F 7 - {Rf a -iV so C2 < (F 7 /(F 7 - 1 )) (C2 - C-i) and thus 


F 7 


E^<Ci 7 + ^ rrT E(^-c2-i) = c? 


R7 K>7 

= C7 + E^(Cl - Cl 7 ) < E^-rCl 


a=l 


a =2 


FT' - 1 


FT - 1 


and Q = 'k 7 /( 2 (p+ 1 ) 2 ). Since u < exp (—2 (p + l) 2 (p + 2 ) log 2 ) by its definition (5), 
F > 2 so F 7 /(F 7 — 1 ) < 2 for any 7 > 1. □ 

Since simplices are simple polytopes, by triangulating any convex polytope D 
into simplices, we can extend our theorem to any polytope D. The constant in the 
bound then depends on the triangulation of D. 

Corollary 3.1. Fix d > 1 and p > 1. Let D C nf = i [ a i- h] be any convex polytope. 
Then for some eo > 0 and for 0 < e < e^B (EE h — a i^j > 


log N U (€,C{D,B),L P ) 


* (uLiQh - °ii) 


l/p \ d/2 
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Proof. By the same scaling argument as in the proof of Theorem 3.1 we may assume 
[dj, bi\ = [0,1] and B = 1. The d = 1 case is given by Dryanov (2009). Any convex 
polytope D can be triangulated into d-dimensional simplices (see e.g. Dey and Pach 
(1998), Rothschild and Straus (1985)). We are done by applying Theorem 3.1 to 
each of those simplices, by (1). □ 

4 Proofs: Relating Gij to a Hyperrectangle 

4.1 Inscribing Gij in a Hyperrectangle 

Theorem 2.2 shows that the bracketing entropy of C (D,B,T) depends on the di¬ 
ameters of the hyperrectangle IDLi [ a i- h] circumscribing D. This is part of why 
bounding entropies on hyperrectangular domains is more straightforward than on 
non-hyperrectangular domains. In this section we prove Propositions 4.1 and 4.2, 
which show how to embed the domains Gij, which partition D, into hyperrectangles. 
We used this in the proof of Theorem 3.1 so we could apply Theorem 2.2. 

The support function for a convex set D is, for x £M. d , 

h(D , x ) := max (d, x) . 
deD 

Then the width function is, for ||it|| = 1, 

w(D, u ) := h(D, u ) + h(D, —u), 

which gives the distance between supporting hyperplanes of D with inner normal 
vectors u and —u, respectively, and let 

w(D) = sup w(D,u). 

|| ix ||=1 

Theorem 2.2 says that the bracketing entropy of convex functions on domain 
D with Lipschitz constraints along directions ei,..., depends on w(D, ef) (since 
that gives the maximum “rise” in “rise over run”). In our proof of Theorem 3.1 we 
partitioned D into sets related to parallelotopes. Thus we will study the widths of 
parallelotopes. We know the width of Gij in the directions v ]a , which are 5i a+ \ — 5i a , 
by definition. 

Lemma 4.1. Let V be a vector space of dimension j E N containing linearly inde¬ 
pendent vectors v\ ,Vj. Let di > 0 for i = 1,... ,j, and let P be the parallelotope 
defined by having w(P,Vi ) = di. Then P satisfies 


wiP) < j\ max di. 

i <i<j 

Proof. The proof is by induction. The case j = 1 is trivial. Now assume the 
statement holds for j — 1 and we want to show it for j. For any x,y € dP we can 
find a path x = xq, x\, ..., x n = y from x to y such that x* and x, + i are elements 
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of (the boundary of) the same facet of P. P has 2 j facets; if n > j, then we can 
find a path through the complementary 2 j — n facets, so that we may assume n < j. 
By the induction hypothesis, ||xi+i — Xi\\ < (j — 1)! maxi<j<j di, since any (j — 1)- 
dimensional facet is a parallelotope lying in a hyperplane with normal vector Vi and 
widths d\, . .., dj_i, dj + i,... dj. Thus 

n 

\\x — y || < ^ ||Xi — Xi- 1 || < n(j — 1)! max di < j\ max di, 
i=i ~ —j — i 


as desired. □ 

This gives a bound on the width of Gij in the direction of each basis vector e a , 
a = 1,k, from Proposition 3.1. 

Proposition 4.1. Let Assumption 1 hold for a convex poly tope D. Fix k E {0,..., d], 
i € Ik, j E Jk, and let Gij be as in (8). Let e^.j = e := (e ±,..., e<j), with e a E 
be the orthornormal basis from Proposition 3.1 . Then 

w(Gij,e a ) < a\{5 ia+ i - 5 ia ). 

for a = 1,..., k. 

Proof. Let a E {1,..., A;} and let w(Gij, e a ) be given by the distance between the 
parallel supporting hyperplanes H\ and H 2 . The distance between H 1 and H 2 is 
equal to the distance between H\f\A and H 2 PA where A is any linear subspace con¬ 
taining the normal vector of H\ and H 2 . Thus, let A = span {vj 1 ,..., v j a } 3 e «- Gij 
is contained in a parallelotope, Gij C n| = j H Jrj where Hj 0 = {1 E M' 1 : ^ < { x i v jp) — <^+i}. 
Let P = rj i= , Hj B nspan {vj l ,..., v ]n }. Then P is a parallelotope contained in the a- 
dimensional vector space V = span {v n ,... , Vj a } with widths w(P, Vj p ) = <5^+1— 5i 0 , 
for /3 = 1,... , a. Thus we can apply Lemma 4.1 and conclude that 

w{Gij,e a ) < w(P, e a ) < al(S ia+1 - S ia ) for a = 1,... , k. 

For the first inequality, we use the fact that w ^n^ =1 5 J/3 , e a ^j > w ^n| =1 iL Jj9 , e Q ^j, 
and that w ^flg =1 e a ^j = w(P, e a ) since the distance between any two supporting 

hyperplanes H\ and H 2 of n| =1 H Jrj is equal to the distance between Hi n A and 
H 2 n A where A is any linear subspace containing the normal vector of H\ and 

H 2 . □ 

We will rely on the following representation for a fc-dimensional parallelotope. 

For sets A and B, let A + B = {a + b : a E A, b E B} . 

Lemma 4.2. Let V be a k-dimensional vector space, and P := n^ =1 Eg be a paral¬ 
lelotope where Ep := {x E V : 0 < (x, vp) < dp} for k linearly independent normal 
unit vectors vp. Let HI := {x E V : (x,vp) = dp}. Let fp be the unit vector lying 
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in n^ =1 with (jfp, vp^j >0, for (3 = 1,..., k. Then 0 is a vertex of P and we 

can write 

k 

p = Y^°>fp\ 

0=1 

where fp := dpfp/ (jp,vpj, [0 , fp] = {\fp : A G [0,1]}. 

Proof. Let Hf : = {x € V : ( x, vp) = 0}. Since the vectors vp are unique, fl k p =l H~jf = 
0 and the intersection of any k — 1 of the hyperplanes H gives a 1-dimensional 

space, span j/^j. A /c-dimensional parallelotope can be written as the set-sum 
of the k intervals emanating from the vertex, each given by the intersection of 
k — 1 of the hyperplanes Hf. See page 56 of Griinbaum (1967). Note that fp 
satisfy ( fp,vp ) = dp so that fp G H^; thus the k intervals are given by [0 ,fp\, 
P = l,...,k. □ 


The next proposition combines the previous ones to bound the widths of Gij 
(i.e., to embed Gij in a hyperrectangle). 

Proposition 4.2. For each k G {l,...,d— 1}, i G Ik,j G Jk, and each Gij, and 
the basis e from Proposition 3.1, for a = k + 1,..., d, we have 

w (Gij,e a ) '■ 2Lkpw(Gj,e a ) (19) 

and 

k r _ r 

Vol d (Gtj) < (2 L k p) d ~ k Volrf_fc (Gj) • H ^ (20) 

a=l (fai v j a J 

where L ^.i is given by (22) and f a is the unit vector with ^ f a ,Vj a ^ > 0 lying in 

span {v n ,.. ■ ,v jk }n(n^ =1 ^ a H+^j, a = l,...,k, where H+ := {y G R d : (y,v^) = 8^ +l - 8^}. 

Proof. Take k G {1 ,... ,d— 1}. Let x be an arbitrary fixed point, which we take 
to be x = Xj (from (4)) for definiteness. Let z = x + Y^=i fj~, where / j7 = dj 1 fj 1 
where 

0 < d h < (<5i 7+ i - 8^)/ (21) 

and fj is given by Lemma 4.2 for the k linearly independent normal vectors 
Vj 1 ,..., Vj k . Take an arbitrary e G span {e^+i,..., e^}. Let A > 0 be such that 
(z + Ae, Vjp ) is maximal over fd G {k + 1,..., N} where A > 0 is such that z + Ae is 
in the boundary of Gij. (That is, Vj p corresponds to the first hyperplane z + Ae in¬ 
tersects for A > 0.) Note that this means {vj p , e) < 0. Then (z + Ae, v ]p ) = pj p + u 
so, with the 8 sequence in (7) and G^j defined for any u > 0, we have 


A 


Pip 


+ u-(z, v jp ) ( x > v p) ~ Pip + u ^ 7=1 

(GV jp ) - ( — e ) Vjp) 


(f-rM-r) 
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Let 


( 22 ) 


Lk ,l : = sup 1/ (— e i v jp) , 

eGspan{efc_)-i,...,e c i} 

which is finite since Gj is bounded. Then 

{x,v jp ) -Pj p < d(x,H jf) ) < d(x,dGj,e) 

since H ]p is the closest hyperplane to x in the direction e. Now, by (5) and (6), we 
have shown 

\<2L kyl d + {x,dGj,e), (23) 

meaning that 


(Gij - z) n span {e fe+ i,. .., e d } C 2L M (Gj - x) 

so we can conclude that w(Gij—z, e a ) < 2L k pw(Gj , e a ) and w(Gij, e a ) < 2L k pw(Gj,e a ) 
since (z,e a ) = 0 for all dj given by the range (21), a = k + 1 ,... ,d, for k = 

1,..., d — 1. It then also follows that 

Vol d (Gij) < (2 L kA ) d ~ k Yol d _ k (Gj) • Volfc [0, / Q ]j , (24) 


where f a = (<5j a+ i — 5i a )f a / f a , v. ja ^ and f a given in the statement of the propo¬ 
sition. This yields (20). □ 

Lemma 4.3. Let Assumption 1 hold and let Gij be as in (8). If Gj = 0, then 

Gi.j = 0 . 

Proof. This follows from Proposition 4.2 and its proof. □ 

The above provides a hyperrectangle containing Gi j. Let A+B = {a + b : a € A,b € B} 
for sets A, B. Let pj t0l := w(Gj, e a ) and then let 

k d 

Ri,j ■ ^ ^ [ a !(^ct+i di a )e a , Oi\(5i a ^.\ ch a )ea]T ^ ) [ 2,L k \pj a e a ^2L k \pj a e a \ . 

a=l a=k-\- 1 

(25) 

Then, for any x € Gi j we have shown 

Gij C x + Rij- (26) 
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